On Semi-Automated Web Taxonomy Construction
نویسندگان
چکیده
The subject of this paper is the semi-automatic construction of taxonomies over the Web. We address the problem of discovering high-quality resources that belong in a particular node of a taxonomy. We show that minimal additional effort is required to provide relevance feedback in a hyperlinked environment, resulting in significant and consistent improvement in quality. Furthermore, this feedback is especially valuable for topics for which it is more difficult to find high-quality pages. Enroute, we describe novel algorithms for hyperlink relevance feedback.
منابع مشابه
TaxaMiner: an experimentation framework for automated taxonomy bootstrapping
Hierarchical taxonomies and thesauri are frequently used by content management systems for indexing, search and categorization. They are also being viewed as rudimentary ontologies for the emerging Semantic Web infrastructure. However, to date, development of taxonomies and thesauri are human intensive processes, requiring huge resources in terms of cost and time. It is critical that approaches...
متن کاملVisual divisive hierarchical clustering using k-means
This paper presents a browser-based semi-automatic taxonomy construction tool Vd-chuck which is able to incorporate text and data mining algorithms into a userfriendly interface. The presented system is browserbased. Its unsupervised learning for concept suggestion and different visualization techniques assist the user with textual and numerical data analysis. We tested the Vdchuck system on a ...
متن کاملA Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch
In this paper we present a graph-based approach aimed at learning a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike many taxonomy learning approaches in the literature, our novel algorithm learns both concepts and relations entirely from scratch via the automated extraction of terms, definitions and hypernyms. This results in a very dense, cyclic and possibly di...
متن کاملMETEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services
Web services are the new paradigm for distributed computing. They have much to offer towards interoperability of applications and integration of large scale distributed systems. To make Web services accessible to users, service providers use Web service registries to publish them. Current infrastructure of registries requires replication of all Web service publications in all Universal Business...
متن کاملBootstrapping Information Extraction from Semi-structured Web Pages
We consider the problem of extracting structured records from semi-structured web pages with no human supervision required for each target web site. Previous work on this problem has either required significant human effort for each target site or used brittle heuristics to identify semantic data types. Our method only requires annotation for a few pages from a few sites in the target domain. T...
متن کامل